Pure Strategy or Mixed Strategy? 



> 



Jun He* 

Department of Computer Science 
Aberystwyth University, Aberystwyth, SY23 3DB, U.K. 



jun.he@ieee.org 



^ ; Feidun He 

Q ■ School of Information Science and Technology, Southwest Jiaotong University 

CN . Chengdu, Sichuan, 610031, China 
rH ■ 

I Hongbin Dong 

College of Computer Science and Technology, Harbin Engineering University 

l>: Harbin, 150001, China 



January 30, 2012 



^ ■ Abstract 

Mixed strategy evolutionary algorithms (EAs) aim at integrating several mutation operators into a 
single algorithm. However no analysis has been made to answer the theoretical question: whether and 
when is the performance of mixed strategy EAs better than that of pure strategy EAs? In this paper, 
■ asymptotic convergence rate and asymptotic hitting time are proposed to measure the performance of 

EAs. It is proven that the asymptotic convergence rate and asymptotic hitting time of any mixed strategy 
1/^ ■ (1+1) EA consisting of several mutation operators is not worse than that of the worst pure strategy 

(1+1) EA using only one mutation operator. Furthermore it is proven that if these mutation operators 
are mutually complementary, then it is possible to design a mixed strategy (1+1) EA whose performance 
is better than that of any pure strategy (1+1) EA using only one mutation operator. 

Keywords: Mixed Strategy, Pure Strategy, Asymptotic Convergence Rate, Asymptotic Hitting 
Time, Hybrid Evolutionary Algorithms 



. 1 Introduction 

, Different search operators have been proposed and applied in EAs [1 . Each search operator has its own 

advantage. Therefore an interesting research issue is to combine the advantages of variant operators together 
and then design more efficient hybrid EAs. Currently hybridization of evolutionary algorithms becomes 
popular due to their capabilities in handling some real world problems [2] . 

Mixed strategy EAs, inspired from strategies and games p], aims at integrating several mutation operators 
into a single algorithm [4]. At each generation, an individual will choose one mutation operator according 
to a strategy probability distribution. Mixed strategy evolutionary programming has been implemented for 
continuous optimization and experimental results show it performs better than its rival, i.e., pure strategy 
evolutionary programming which utilizes a single mutation operator [5J |5] . 

However no analysis has been made to answer the theoretical question: whether and when is the perfor- 
mance of mixed strategy EAs better than that of pure strategy EAs? This paper aims at providing an initial 
answer. In theory, many of EAs can be regarded as a matrix iteration procedure. Following matrix iteration 
analysis [7], the performance of EAs is measured by the asymptotic convergence rate, i.e., the spectral radius 
of a probability transition sub-matrix associated with an EA. Alternatively the performance of EAs can be 
measured by the asymptotic hitting time 8 , which approximatively equals the reciprocal of the asymptotic 
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convergence rate. Then a theoretical analysis is made to compare the performance of mixed strategy and 
pure strategy EAs . 

The rest of this paper is organized as follows. Section 2 describes pure strategy and mixed strategy EAs. 
Section 3 defines asymptotic convergence rate and asymptotic hitting time. Section 4 makes a comparison of 
pure strategy and mixed strategy EAs. Section 5 concludes the paper. 



2 Pure Strategy and Mixed Strategy EAs 

Before starting a theoretical analysis of mixed strategy EAs, we first demonstrate the result of a computational 
experiment. 



Example 1. Let's see an instance of the average capacity 0-1 knapsack problem JP t \1 U^ : 

(1) 



maximize Yl,^iLl^^^i^ ^ {0: 1}j 



subject to J2i=i "^i^i — 

where vi = 10 and Vi — I for i — 2, ■ ■ ■ , 10; wi — 9 and Wi = 1 for i — 2, - ■ ■ , 10; C = 9. 
The fitness function is that for x — (hi, ■ ■ ■ , bio) 



fix) 



0, ^fYZl^^b^>C. 



We consider two types of mutation operators: 

• si: flip each bit bi with a probability 0.1; 

• s2: flip each bit bi with a probability 0.9; 

The selection operator is to accept a better offspring only. 

Three (1+1) EAs are compared in the computation experiment: (1) EA(sl) which adopts si only, (2) 
EA(s2) with s2 only, and (3) EA(sl,s2) which chooses either si or s2 with a probability 0.5 at each generation. 

Each of these three EAs runs 100 times independently. The computational experiment shows that EA(sl, 
s2) always finds the optimal solution more quickly than other twos. 

This is a simple case study that shows a mixed strategy EA performs better than a pure strategy EA. In 
general, we need to answer the following theoretical question: whether or when do a mixed strategy EAs are 
better than pure strategy EAs? 

Consider an instance of the discrete optimization problem which is to maximize an objective function 

/(^): 

m&^{f{x)-xeS}, (2) 

where S a finite set. For the analysis convenience, suppose that all constraints have been removed through 
an appropriate penalty function method. Under this scenario, all points in S are viewed as feasible solutions. 
In evolutionary computation, f{x) is called a fitness function. 

The following notation is used in the algorithm and text thereafter. 

• x,y,z S are called points in S, or individuals in EAs or states in Markov chains. 

• The optimal set Sopt ^ is the set consisting of all optimal solutions to Problem ^ and non- optimal 

set Snon '.— S \ Sopt- 

• t is the generation counter. A random variable $t represents the state of the t-th generation parent; 
^t+i/2 the state of the child which is generated through mutation. 

The mutation and selection operators are defined as follows: 

• A mutation operator is a probability transition from S to S. It is defined by a mutation probability 
transition matrix Pm whose entries are given by 

Pm{x,y), x,y e S. (3) 
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• A strict elitist selection operator is a mapping from S x S to S, that is for x G S and y E S, 

^ _ / a;, if /(y) < /(x), 
^ - \ y, if /(y) > fix). 

A pwre strategy (1+1) EA, which utihzes only one mutation operator, is described in Algorithm[T] 



Algorithm 1 Pure Strategy Evolutionary Algorithm EA(s) 
1: input: fitness function; 
2: generation counter t <— 0; 
3: initialize $0; 

4: while stopping criterion is not satisfied do 
5: $t+i/2 ^ mutate $t by mutation operator s; 
6: evaluate the fitness of 

7: <— select one individual from {$t, $4+1/2} by strict elitist selection; 

8: t^t+1; 
9: end while 

10: output: the maximal value of the fitness function. 



The stopping criterion is that the running stops once an optimal solution is found. If an EA cannot find 
an optimal solution, then it will not stop and the running time is infinite. This is common in the theoretical 
analysis of EAs. 

Let si, SK be k mutation operators (called strategies). Algorithm [5] describes the procedure of a mixed 
strategy (1+1) EA. At the t-th generation, one mutation operator is chosen from the k strategies according 
to a strategy probability distribution 

Qsiix),--- ,qsK{x), (5) 

subject to < Qsix) < 1 and J2s Qs{x) — 1. 

Write this probability distribution in short by a vector q(x) = [qs{x)]. 



Algorithm 2 Mixed Strategy Evolutionary Algorithm EA(sl, sk) 
1: input: fitness function; 
2: generation counter t <— 0; 
3: initialize $0; 

4: while stopping criterion is not satisfied do 
5: choose a mutation operator sk from si, sac; 
6: $t+i/2 mutate $t by mutation operator sk; 
7: evaluate $4+1/2; 

8: $4+1 <— select one individual from {$4,^4+1/2} by strict elitist selection; 
9: i4-t+l; 
10: end while 

11: output: the maximal value of the fitness function. 



Pure strategy EAs can be regarded a special case of mixed strategy EAs with only one strategy. 
EAs can be classified into two types: 

• A homogeneous EA is an EA which applies the same mutation operators and same strategy probability 
distribution for all generations. 

• An inhomogeneous EA is an EA which doesn't apply the same mutation operators or same strategy 
probability distribution for all generations. 

This paper will only discuss homogeneous EAs mainly due to the following reason: 

• The probability transition matrices of an inhomogeneous EA may be chosen to be totally different at 
different generations. This makes the theoretical analysis of an inhomogeneous EA extremely hard. 
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3 Asymptotic Convergence Rate and Asymptotic Hitting Time 

Suppose that a homogeneous EA is apphed to maximize a fitness function f{x), then the population sequence 
— 0, 1,---} can be modelled by a homogeneous Markov chain I12 | . Let P be the probability 
transition matrix, whose entries are given by 

P{x,y)^P{^t+i^y\^t^x), x,yeS. 

Starting from an initial state x, the mean number m{x) of generations to find an optimal solution is called 
the hitting time to the set S'opt I13j . 

t{x) := min{t; $t e S'opt | $o x}, 

+00 

m{x) E[t{x)] ^tP(r(x) ^ t). 

t=o 

Let's arrange all individuals in the order of their fitness from high to low: Xi,X2, - ■ ■ , then their hitting 
times are: 

m{xi),m{x2), ■■■ . 

Denote it in short by a vector m — [m(2:)]. 

Write the transition matrix P in the canonical form [14] , 



I 

* T 



(6) 



where I is a unit matrix and a zero matrix. T denotes the probability transition sub-matrix among 
non-optimal states, whose entries are given by 

P{x,y), xeS 

non 7 y ^ Snon • 

The part * plays no role in the analysis. 

Since Vx € Sopt,rn{x) = 0, it is sufficient to consider m{x) on non-optimal states x G S'non- For the 
simplicity of notation, the vector m will also denote the hitting times for all non-optimal states: [m(a;)], x G 

Snon ■ 

The Markov chain associated with an EA can be viewed as a matrix iterative procedure, where the 
iterative matrix is the probability transition sub- matrix T. Let po be the vector [po(a^)] which represents the 
probability distribution of the initial individual: 

Po{x) := P($o =2:), X € S'non, 

and pt the vector [pt(a;)] which represents the probability distribution of the i-generation individual: 

pt{x) -.^ P{^t = X), X e Snon- 

If the spectral radius p(T) of the matrix T satisfies: p{T) < 1, then we know [7] 

lim II Pt IhO. 

Following matrix iterative analysis [7], the asymptotic convergence rate of an EA is defined as below. 
Definition 1. The asymptotic convergence rate of an EA for maximizing f{x) is 

i?(T) :=-lnp(T) (7) 
where T is the probability transition sub-matrix restricted to non-optimal states and p(T) its spectral radius. 
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Figure 1: The relationship between the asymptotic hitting time and asymptotic convergence rate: 1/R{T) < 
T(T) < 1.5/i?(T) if p(T) > 0.5. 




Asymptotic convergence rate is different from previous definitions of convergence rate based on matrix 
norms or probabihty distribution |12j . 

Note: Asymptotic convergence rate depends on both the probabihty transition sub-matrix T and fitness 
function f{x). Because the spectral radius of the probability transition matrix p(P) — 1, thus p(P) cannot 
be used to measure the performance of EAs. Becaue the mutation probability transition matrix is the same 
for all functions f{x), and p{Pm) — 1, so p{Pm) cannot be used to measure the performance of EAs too. 

If p(T) < 1, then the hitting time vector satisfies (see Theorem 3.2 in [H]), 

m=(I-T)-4. (8) 

The matrix N := (I — T)^^ is called the fundamental matrix of the Markov chain, where T is the 
probability transition sub-matrix restricted to non-optimal states. 

The spectral radius p(N) of the fundamental matrix can be used to measure the performance of EAs too. 

Definition 2. The asymptotic hitting time of an EA for maximizing f{x) is 

_ r p(N) - p{(I - T)-i), z/ p{T) < 1, 
^^"1+^, z/p(T)-l. 

where T is the probability transition sub-matrix restricted to non-optimal states and N is the fundamental 
matrix. 

From Lemma 5 in [8 ,, we know the asymptotic hitting time is between the best and worst case hitting 
times, i.e., 

min{TO(a;); a; € S'non} < T(T) < ma.x{m(x);x e S'non}- (9) 
From Lemma 3 in [8], we know 
Lemma 1. For any homogeneous (l-t-l)-EA using strictly elitist selection, it holds 

p(T) = max{P(x,x);x G 5non}, 

P(N) = ^-^, */p(T)<l. 

From Lemma [T] and Taylor series, we get that 

fl(TmT).gj(— 

If we make a mild assumption T(T) > 2, (i.e., the asymptotic hitting time is at least two generations), 
then the asymptotic hitting time approximatively equals the reciprocal of the asymptotic convergence rate 
(see Figure 1). 
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Example 2. Consider the problem of maximizing the One-Max function: 

where x — {bi ■ ■ ■ 6„) a binary string, n the string length and | a; |:= bi- The mutation operator used in 

the (1+1) EA is to choose one bit randomly and then flip it. 

Then asymptotic convergence rate and asymptotic hitting time are 

l/n < R{T) < 1), 
T(T) = n. 

4 A Comparison of Pure Strategy and Mixed Strategy 

In this section, subscripts q and s are added to distinguish between a mixed strategy EA using a strategy 
probabihty distribution q and a pure strategy EA using a pure strategy s. For example, Tq denotes the 
probabihty transition sub-matrix of a mixed strategy EA; the transition sub-matrix of a pure strategy 
EA. 

Theorem 1. Let si, • • • sk be k mutation operators. 

1. The asymptotic convergence rate of any mixed strategy EA consisting of these k mutation operators is 
not smaller than the worst pure strategy EA using only one of these mutation operator; 

2. and the asymptotic hitting time of any mixed strategy EA is not larger than the worst pure strategy EA 
using one only of these mutation operator. 

Proof. (1) From Lemma [T] we know 

1 " 

p(Tq) = ma.x{-y^ Ps^{x,x);X e Snon} 
K ^ — ' 

fc=l 



fc=i 

< max{p(Tsfc);A: = 1, • • • 



Thus we get that 

i?(Tq) :=-lnp(Tq) > max{- lnp(T,fe); fc = 1, • • • ,k}. 



(2) From Lemma [l] we know 



p(N) 



l-p(T)' 

then we get p(Nq) < max{p(Nsj^); A; = 1, • • • , k}. □ □ 

In the following we investigate whether and when the performance of a mixed strategy EA is better than 
a pure strategy EA. 

Definition 3. A mutation operator si is called complementary to another mutation operator s2 on a fitness 
function f{x) if for any x such that 

P,i{x,x)^ p{Tsi), (10) 

it holds 

Ps2{x,x) < p{Tsl). (11) 

Tiieorem 2. Let f{x) be a fitness function and EA(sl) a pure strategy EA. Lf a mutation operator s2 is 
complementary to si, then it is possible to design a mixed strategy EA(sl,s2) which satisfies 

1. its asymptotic convergence rate is larger than that of EA(sl); 

2. and its asymptotic hitting time is shorter than that of EA(sl). 



6 



Proof. (1) Design a mixed strategy EA(sl, s2) as follows. For any x such that 
let the strategy probability distribution satisfy 

qs2{x) = 1. 

For any other x, let the strategy probability distribution satisfy 

qsi{x) = 1. 

Because s2 is complementary to si, we get that 

p(Tq) < p(T,l), 

and then 

-lnp(Tq) > -lnp(T,i), 

which proves the first conclusion in the theorem. 
(2) From Lemma [T] 

p(N) ^ 



l-p(T) 
we get that 

p(Nq) < p(N,fc), Vfc = l, 

which proves the second conclusion in the theorem. □ □ 

Definition 4. k mutation operators si, • • • , sk are called mutually complementary on a fitness function f{x) 
if for any x G Snon and si G {si, • • • , sk} such that 

Psi{x, x) > min{p(T,i), • • ■ , p(T,«)}, (12) 

it holds: 3sk ^ si, 

Psk{x, x) < min{p(T,i), • • • , ^(T,^)}. (13) 

Theorem 3. Let f{x) be a fitness function and si,-- - , sk be k mutation operators. If these mutation 
operators are mutually complementary, then it is possible to design a mixed strategy EA which satisfies 

1. its asymptotic convergence rate is larger than that of any pure strategy EA using one mutation operator; 

2. and its asymptotic hitting time is shorter than that of any pure strategy EA using one mutation operator. 

Proof. (1) We design a mixed strategy EA(sl, sk) as follows. For any x and any strategy si £ {si, - - - , sk} 
such that 

Psi{x, x) > min{p(T,i), - - - , p(T,«)}, 
from the mutually complementary condition, we know 3sk =/= si, it holds 

Pskix,x) < min{p(Tsi),-- - ,p{Ts^)}. 

Let the strategy probability distribution satisfy 

qsk{x) = 1. 

For any other x, we assign a strategy probability distribution in any way. 
Because the mutation operators are mutually complementary, we get that 

p(Tq) <min{p(T,i),-- - ,p(T,J}, 
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and then 



■ Inp(Tq) > min{- lnp(T,i), • • • , - lnp(T,„)}, 



which proves the first conclusion in the theorem. 
(2) From Lemma [1] 

p(N) 



l-p(T)' 



we get that 



p(Nq) < p(N,fc), Vfc = l, 



which proves the second conclusion in the theorem. □ 

Example 3. Consider the problem of maximizing the following fitness function f{x) (see Figure\^: 



□ 



a; I, if \ X \< 0.5n and \ x\ is even; 
fi^) — \ 1^1 +2, if I a; |< 0.5n and \ x \ is odd; 
X \, if \ X \> 0.5n. 



where x — {bi ■ ■ ■ 6„) is a binary string, n the string length and \ x \:= X]r=i ^« 



Figure 2: The shape of the function f{x) in Example [3] when n = 16. 
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Consider two common mutation operators: 

• si: to choose one bit randomly and then flip it; 

• s2: to flip each bit independently with a probability l/n. 

EA(sl) uses the mutation operator si only. Then p{T^si) = 1, o,nd then the asymptotic convergence rate 
is i?(T,i) = 0. 

EA(s2) utilizes the mutation operator s2 only. Then 

... 1 

p(T,2) = l--(l-- 

n \ n 



We ho 



(1) For any x such that 



min{p(T,i),p(T,2)} 

n \ n 



1 / 1 

Psi{x,x)>l--[l-- 
n \ n 



we have 



Psi{x,x) = 1, 
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and we know that 

Ps2{x,x) <1-- ( 1-- 



1/ 1^"-' 



n 

(2) For any x such that 



1 / r 



P,2(a;,a;)=p(T,2) = l-- 1-- 

n \ n 



we know that 

Psi{x,x) = 1 - - < p(T,2) = 1 - - ( 1 - - 



1 , . If 1' 



n 



Hence these two mutation operators are mutually complementary. 

We design a mixed strategy EA(sl,s2) as follows: let the strategy probability distribution satisfy 



qsiix) 



0, tf \x\< 0.5n; 

1, if \ X |> 0.5n. 



According to Theorem\^ the asymptotic convergence rate of this mixed strategy EA(sl,s2) is larger than 
that of either EA(sl) or EA(s2). 



5 Conclusion and Discussion 

The result of this paper is summarized in three points. 

• Asymptotic convergence rate and asymptotic hitting time are proposed to measure the performance of 
EAs. They are seldom used in evaluating the performance of EAs before. 

• It is proven that the asymptotic convergence rate and asymptotic hitting time of any mixed strategy 
(1+1) EA consisting of several mutation operators is not worse than that of the worst pure strategy 
(1+1) EA using only one of these mutation operators. 

• Furthermore, if these mutation operators are mutually complementary, then it is possible to design a 
mixed strategy EA whose performance (asymptotic convergence rate and asymptotic hitting time) is 
better than that of any pure strategy EA using one mutation operator. 

An argument is that several mutation operators can be applied simultaneously, e.g., in a population- 
based EA, different individuals adopt different mutation operators. However in this case, the number of 
fitness evaluations at each generation is larger than that of a (1+1) EA. Therefore a fair comparison should 
be a population-based mixed strategy EA against a population-based pure strategy EA. Due to the length 
restriction, this issue will not be discussed in the paper. 
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